AITopics | data clustering

Collaborating Authors

data clustering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Clustering by Markovian Relaxation and the Information Bottleneck Method

Neural Information Processing SystemsApr-6-2023, 17:07:25 GMT

We introduce a new, non-parametric and principled, distance based clustering method. This method combines a pairwise based ap(cid:173) proach with a vector-quantization method which provide a mean(cid:173) ingful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures dur(cid:173) ing this relaxation, and then are extracted using the information bottleneck method. The method can cluster data with no geometric or other bias and makes no assumption about the underlying distribution.

data clustering, information bottleneck method, markovian relaxation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.77)

Add feedback

Size Regularized Cut for Data Clustering

Neural Information Processing SystemsApr-6-2023, 15:18:59 GMT

We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut.

data clustering, size regularized cut, srcut

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.77)

Add feedback

How to innovate my company using Artificial Intelligence?

#artificialintelligenceDec-2-2021, 03:35:05 GMT

Artificial Intelligence ( AI) has been one of the most recurrent topics of conversation and analysis in companies in recent years, especially among those who work on innovation within the company. Faced with an increasingly globalized world and in constant digital transformation, every day more professionals agree on the urgent need to incorporate and exploit disruptive technologies such as Artificial Intelligence to achieve the growth planned in the short, medium and long term. For those who are not familiar with the concept, we can say that Artificial Intelligence is a segment that belongs to the field of computer science. It is defined as the type of technology that allows systems and machines to simulate human intelligence . We refer to the ability to make decisions and also simulate actions that a human being would take.

artificial intelligence, innovate, innovation, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Applied AI (0.43)

Add feedback

Coarse-Refinement Dilemma: On Generalization Bounds for Data Clustering

Vaz, Yule, de Mello, Rodrigo Fernandes, Grossi, Carlos Henrique

arXiv.org Machine LearningNov-13-2019

This paper is organized as follows: Section 2 briefly introduces some studies related to the formalization of theoretical frameworks in the context of the Data Clustering (DC) problem; Section 3 introduces a general formulation for the DC and HC problems; Section 4 discusses the Coarse-Refinement Dilemma considering the homology group H 0; Section 5 shows that homology groups of degree greater than zero are affected by overrefined and over-coarsed topologies; Section 6 compares our proposed generalization bounds to Carlsson and M emoli [12]'s consistency; finally, conclusions and future directions are provided in Section 8. 2. Related work Data Clustering (DC) faces many challenges in defining and guaranteeing generalization from datasets, as it does not rely on labels and, consequently, it cannot take advantage of computing any evident error measurement such as risk [7]. While studying this issue, Kleinberg [8] considered that a clustering model is an application of a mapping f on top of a distance function d: I I R, given I contains indices of data points in some fixed-size set S, disregarding its ambient space though [25]. From this initial setup, Kleinberg [8] defined three properties to be respected in order to assess clustering algorithms and models: - Scale-invariance: Given a distance and a clustering function, d and f, and a scalar α, the following must hold f (d) f (αd). Thus, the similarity representation over S must be consistent with the units of measurement; - Consistency: Let Γ be a partition of S and d,d null two distance functions. Function d null is referred to as a Γ transformation of d if: (i) for all i,j S belonging to the same cluster, d null (i,j) d( i,j); and (ii) for all i,j S belonging to different clusters, d null (i,j) d( i,j). Consistency holds if f (d null) f ( d) whenever d null is a Σ transformation of d.

dataset, homology class, topological space, (16 more...)

arXiv.org Machine Learning

1911.05806

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
South America > Brazil (0.04)
(5 more...)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Structure Aware L1 Graph for Data Clustering

Han, Shuchu (Stony Brook Univsersity) | Qin, Hong (Stony Brook Univsersity)

AAAI ConferencesApr-19-2016

In graph-oriented machine learning research, L1 graph is an efficient way to represent the connections of input data samples. Its construction algorithm is based on a numerical optimization motivated by Compressive Sensing theory. As a result, It is a nonparametric method which is highly demanded. However, the information of data such as geometry structure and density distribution are ignored. In this paper, we propose a Structure Aware (SA) L1 graph to improve the data clustering performance by capturing the manifold structure of input data. We use a local dictionary for each datum while calculating its sparse coefficients. SA-L1 graph not only preserves the locality of data but also captures the geometry structure of data. The experimental results show that our new algorithm has better clustering performance than L1 graph.

artificial intelligence, graph, machine learning, (16 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback

Data Clustering by Laplacian Regularized L1-Graph

Yang, Yingzhen (University of Illinois at Urbana-Champaign) | Wang, Zhangyang (University of Illinois at Urbana-Champaign) | Yang, Jianchao (Adobe Research) | Wang, Jiangping (University of Illinois at Urbana-Champaign) | Chang, Shiyu (University of Illinois at Urbana-Champaign) | Huang, Thomas S (University of Illinois at Urbana-Champaign)

AAAI ConferencesJul-14-2014

L1-Graph has been proven to be effective in data clustering, which partitions the data space by using the sparse representation of the data as the similarity measure. However, the sparse representation is performed for each datum separately without taking into account the geometric structure of the data. Motivated by L1-Graph and manifold leaning, we propose Laplacian Regularized L1-Graph (LRℓ1-Graph) for data clustering. The sparse representations of LRℓ1-Graph are regularized by the geometric information of the data so that they vary smoothly along the geodesics of the data manifold by the graph Laplacian according to the manifold assumption. Moreover, we propose an iterative regularization scheme, where the sparse representation obtained from the previous iteration is used to build the graph Laplacian for the current iteration of regularization. The experimental results on real data sets demonstrate the superiority of our algorithm compared to L1-Graph and other competing clustering methods.

artificial intelligence, machine learning, sparse representation, (13 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Illinois > Champaign County > Urbana (0.05)
North America > United States > California > Santa Clara County > San Jose (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Active Data Clustering

Hofmann, Thomas, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1998

Active data clustering is a novel technique for clustering of proximity data which utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The proposed active data sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of largescale data sets, because it offers a way to overcome the inherent data sparseness of proximity data.

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Active Data Clustering

Hofmann, Thomas, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1998

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Active Data Clustering

Hofmann, Thomas, Buhmann, Joachim M.

Neural Information Processing SystemsDec-31-1998

Active data clustering is a novel technique for clustering of proximity datawhich utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The proposed activedata sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of largescale datasets, because it offers a way to overcome the inherent data sparseness of proximity data.

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multidimensional Scaling and Data Clustering

Hofmann, Thomas, Buhmann, Joachim

Neural Information Processing SystemsDec-31-1995

Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics, genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).

algorithm, dissimilarity, multidimensional scaling, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback